Some Distributions Related to the Normal The Normal and Sums of Normals The sum of independent normally distributed random variables is also normally distributed.
Details The sum of independent normally distributed random variables is also normally distributed.
More specifically, if X 1 ∼ N ( μ 1 , σ 1 2 ) X_1 \sim N(\mu_1, \sigma_{1}^2) X 1 ∼ N ( μ 1 , σ 1 2 ) and X 2 ∼ N ( μ 2 , σ 2 2 ) X_2 \sim N(\mu_2, \sigma_{2}^2) X 2 ∼ N ( μ 2 , σ 2 2 ) are independent, then X 1 + X 2 ∼ N ( μ , σ 2 ) X_1 + X_2 \sim N(\mu, \sigma^2) X 1 + X 2 ∼ N ( μ , σ 2 ) , since μ = E [ X 1 + X 2 ] = μ 1 + μ 2 \mu = E \left[ X_1 + X_2 \right] = \mu_1 + \mu_2 μ = E [ X 1 + X 2 ] = μ 1 + μ 2 and\ σ 2 = V a r [ X 1 + X 2 ] \sigma^2 = Var \left[ X_1 + X_2 \right] σ 2 = Va r [ X 1 + X 2 ] with σ 2 = σ 1 2 + σ 2 2 \sigma^2 = \sigma_{1}^2 + \sigma_{2}^2 σ 2 = σ 1 2 + σ 2 2 \ if X 1 X_1 X 1 and X 2 X_2 X 2 are independent.
Similarly
∑ i = 1 n X i \displaystyle\sum_{i=1}^{n} X_i i = 1 ∑ n X i
is normal if X 1 , … , X n X_1, \ldots, X_n X 1 , … , X n are normal and independent.
Examples Example: Simulating and plotting a single normal distribution.
Y ∼ N ( 0 , 1 ) Y \sim N(0,1) Y ∼ N ( 0 , 1 )
library(MASS) # for truehist par(mfcol=c(2,2)) y <- rnorm(1000) # generating 1000 N(0,1) mn <- mean(y) vr <- var(y) truehist(y,ymax=0.5) # plot the histogram xvec <-seq(-4,4,0.01) # generate the x-axis yvec <- dnorm(xvec) # theoretical N(0,1) density lines(xvec,yvec,lwd=2,col="red") ttl <- paste("Simulation and theory N(0,1)\n", "mean=",round(mn,2), "and variance=",round(vr,2)) title(ttl)
Example: Sum of two normal distributions
Y 1 ∼ N ( 2 , 2 2 ) Y_1 \sim N(2, 2^2) Y 1 ∼ N ( 2 , 2 2 )
and
Y 2 ∼ N ( 3 , 3 2 ) Y_2 \sim N(3, 3^2) Y 2 ∼ N ( 3 , 3 2 )
y1 <- rnorm(10000,2,2) # N(2,2^2) y2 <- rnorm(10000,3,3) # N(3, 3^2) y <- y1+y2 truehist(y) xvec <- seq(-10,20,0.01) mn<-mean(y) vr <- var(y) cat("The mean is",mn,"\n") cat("The variance is ",vr,"\n") cat("The standard deviation is", sd(y), "\n") yvec <- dnorm(xvec,mean=5,sd=sqrt(13)) # N() density lines(xvec,yvec,lwd=2,col="red") ttl <- paste("The sum of N(2,2^2) and N(3,3^2)\n", "mean=",round(mn,2), "and variance=", round(vr,2)) title(ttl)
Example Sum of nine normal distributions, all with μ = 42 \mu = 42 μ = 42 and σ 2 = 2 2 \sigma^2=2^2 σ 2 = 2 2 . ymat <- matrix(rnorm(10000*9,42,2),ncol=9) y <- apply(ymat,1,mean) truehist(y) mn <- mean(y) vr <- var(y) cat("The mean is",mn,"\n") cat("The variance is ",vr,"\n") cat("The standard deviation is",sd(y),"\n") # plot the theoretical curve xvec <- seq(39,45,0.01) yvec <- dnorm(xvec,mean=5,sd=sqrt(13)) # N() density lines(xvec,yvec,lwd=2,col="red") ttl <- paste("The sum of nine N(42^2) \n", "mean=",round(mn,2), "and variance=",round(vr,2)) title(ttl)
The Chi-square Distribution If X ∼ N ( 0 , 1 ) X \sim N(0,1) X ∼ N ( 0 , 1 ) ,then Y = X 2 Y = X^2 Y = X 2 has a distribution which is called the chi-square distribution ( χ 2 \chi^2 χ 2 ) on one degree of freedom.
This can be written as:
Y ∼ χ 2 Y \sim \chi^2 Y ∼ χ 2
Details If X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n are i.i.d. N ( 0 , 1 ) N(0,1) N ( 0 , 1 ) then the distribution of Y = X 1 2 + X 1 2 + … + X n 2 Y = X_1^2 + X_1^2 + \ldots + X_n^2 Y = X 1 2 + X 1 2 + … + X n 2 has a square ( χ 2 \chi^2 χ 2 )distribution .
Sum of Chi-square Distributions Let Y 1 Y_1 Y 1 and Y 2 Y_2 Y 2 be independent variables.
If Y 1 = χ ν 1 2 Y_1 = \chi^2_{\nu_1} Y 1 = χ ν 1 2 and Y 2 = χ ν 2 2 Y_2 = \chi^2_{\nu_2} Y 2 = χ ν 2 2 , then the sum of these two variables also follows a chi-squared ( χ 2 \chi^2 χ 2 ) distribution:
Y 1 + Y 2 = χ ν 1 + ν 2 2 Y_1 + Y_2 = \chi^2_{\nu_1+ \nu_2} Y 1 + Y 2 = χ ν 1 + ν 2 2
Details Recall that if
X 1 , … , X n ∼ N ( μ , σ 2 ) X_1, \ldots, X_n \sim N (\mu, \sigma^2) X 1 , … , X n ∼ N ( μ , σ 2 )
are i.i.d., then
∑ i = 1 n ( X ˉ − μ σ ) 2 = ∑ i = 1 n ( X ˉ − μ ) 2 σ ∼ χ 2 \displaystyle\sum_{i=1}^n \left ( \displaystyle\frac {\bar{X} - \mu} {\sigma}\right ) ^2= \displaystyle\sum_{i=1}^n \displaystyle\frac {\left ( \bar{X} - \mu\right ) ^2} {\sigma}\sim \chi^2 i = 1 ∑ n ( σ X ˉ − μ ) 2 = i = 1 ∑ n σ ( X ˉ − μ ) 2 ∼ χ 2
Sum of Squared Deviation If X 1 , ⋯ , X n ∼ N ( μ , σ 2 ) X_1,\cdots,X_n \sim N(\mu,\sigma^2) X 1 , ⋯ , X n ∼ N ( μ , σ 2 ) i.i.d, then
∑ i = 1 n ( X i − μ σ ) 2 ∼ χ n 2 , \displaystyle\sum_{i=1}^n \left ( \displaystyle\frac{X_i-\mu}{\sigma} \right )^2 \sim \chi_{n}^2, i = 1 ∑ n ( σ X i − μ ) 2 ∼ χ n 2 ,
but we are often interested in
1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 ∼ χ n − 1 2 \displaystyle\frac{1}{n-1}\displaystyle\sum_{i=1}^n (X_i-\bar{X})^2\sim \chi_{n-1}^2 n − 1 1 i = 1 ∑ n ( X i − X ˉ ) 2 ∼ χ n − 1 2
Details Consider a random sample of Gaussian random variables, i.e. X 1 , ⋯ , X n ∼ N ( μ , σ 2 ) X_1,\cdots,X_n \sim N(\mu,\sigma^2) X 1 , ⋯ , X n ∼ N ( μ , σ 2 ) i.i.d.
Such a collection of random variables have properties which can be used in a number of ways.
∑ i = 1 n ( X i − μ σ ) 2 ∼ χ n 2 \displaystyle\sum_{i=1}^n \left ( \displaystyle\frac{X_i-\mu}{\sigma} \right )^2 \sim \chi_{n}^2 i = 1 ∑ n ( σ X i − μ ) 2 ∼ χ n 2
but we are often interested in
1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 ∼ χ n − 1 2 \displaystyle\frac{1}{n-1}\displaystyle\sum_{i=1}^n (X_i-\bar{X})^2\sim \chi_{n-1}^2 n − 1 1 i = 1 ∑ n ( X i − X ˉ ) 2 ∼ χ n − 1 2
A degree of freedom is lost because of subtracting the estimator of the mean as opposed to the true mean.
The correct notation is:
μ \mu μ := population mean
X ˉ \bar{X} X ˉ := sample mean (a random variable)
x ˉ \bar{x} x ˉ := sample mean (a number)
The T T T distribution If U ∼ N ( 0 , 1 ) U\sim N(0,1) U ∼ N ( 0 , 1 ) and W ∼ χ ν 2 W\sim\chi^{2}_{\nu} W ∼ χ ν 2 are independent, then the random variable
T = U w ν T=\displaystyle\frac{U}{\sqrt{\displaystyle\frac{w}{\nu}}} T = ν w U
has a distribution which we call the T T T distribution on ν \nu ν degrees of freedom denoted T ∼ t ν T \sim t_{\nu} T ∼ t ν .
Details If U ∼ N ( 0 , 1 ) U\sim N(0,1) U ∼ N ( 0 , 1 ) and W ∼ χ ν 2 W\sim\chi^{2}_{\nu} W ∼ χ ν 2 are independent, then the random variable
T : = U w ν T:=\displaystyle\frac{U}{\sqrt{\displaystyle\frac{w}{\nu}}} T := ν w U
has a distribution which we call the T T T distribution on ν \nu ν degrees of freedom, denoted T ∼ t ν T \sim t_\nu T ∼ t ν .
It turns out that if X 1 , … , X n ∼ N ( μ , σ 2 ) X_1, \ldots,X_n \sim N(\mu,\sigma ^2) X 1 , … , X n ∼ N ( μ , σ 2 ) and we set:
X ˉ = 1 n ∑ i = 1 n X i \bar{X}=\displaystyle\frac{1}{n}\displaystyle\sum_{i=1}^n X_i X ˉ = n 1 i = 1 ∑ n X i
and
S = 1 1 − n ∑ i = 1 n ( X i − X ) 2 S= \sqrt{\displaystyle\frac{1}{1-n}\displaystyle\sum_{i=1}^n (X_i-X)^2} S = 1 − n 1 i = 1 ∑ n ( X i − X ) 2
then
X ˉ − μ S / n ∼ t n − 1 \displaystyle\frac{\bar{X}-\mu}{S/\sqrt{n}} \sim t_{n-1} S / n X ˉ − μ ∼ t n − 1
This follows from X ˉ \bar{X} X ˉ and Σ i = 1 N ( X i − X ˉ ) 2 \Sigma_{i=1}^N(X_i-\bar{X})^2 Σ i = 1 N ( X i − X ˉ ) 2 being independent and X ˉ − μ σ / n ∼ N ( 0 , 1 ) \displaystyle\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1) σ / n X ˉ − μ ∼ N ( 0 , 1 ) , ∑ ( X i − X ˉ ) 2 σ 2 ∼ χ n − 1 2 \displaystyle\sum \displaystyle\frac{(X_i-\bar{X})^2}{\sigma^2}\sim \chi_{n-1}^2 ∑ σ 2 ( X i − X ˉ ) 2 ∼ χ n − 1 2 .